Introduction

New York City aims to reduce the amount of waste disposed of by 90 percent by 2030 and send zero waste to landfills by that point, according to its new plan for the city.1

This project aims to determine how much waste New York City (NYC) residents recycle and whether recycling varies among different districts in NYC. If a difference does exist, we hope to uncover possible explanations for it. These differences may be explained by socioeconomic factors such as income and educational attainment. It’s possible that the amount of recycling in each district might also be affected by the sheer number of public recycling bins in each district or the average distance to the nearest recycling bin in each district.

Motivation

Considering that recycling levels differ among different districts in New York City, it would be very interesting to learn the reasons or explanations behind these differences. Understanding why recycling is successful in some areas and unsuccessful in others may inspire new initiatives to promote recycling in New York City. In addition to being great for the environment, it would also benefit NYC’s residents as recycling reduces pollution, conserves energy, reduces the need for new raw materials, and keeps trash out of landfills.

The gained knowledge could possibly be applied to other cities in the US or around the world. The United Nation’s Sustainable Development Goals 2 11 and 12 are concerned with sustaianble cities and responsible consumption, which are areas that this project could affect positively.

Below is a video that summarizes our project idea:

Data sources

In order to investigate how the level of recycling differs between New York districts and how these levels are affected by socioeconomic factors, we need a few different data sources. Most importantly, we need recycling and social data but also geographical data for visualization purposes and for relating e.g. the locations of public recycling bins to the districts.


Recycling data

Our main recycling data set is provided by the Department of Sanitation in NYC and provides information about the waste recycling rates, i.e. the amount of waste recycled as a fraction of the total waste stream, in each district for 17 different months in 2018 and 2019. The recycling rates provided are for paper and for metal, glass, plastic & beverage cartons as well as the total rate, which is the combination of the two former. The data set is avaialable here.

Our second source of recycling data is a register of the geographical locations of all public recycling bins in New York City. This data set is also provided by the Department of Sanitation and is available here.


Social data

To get insights about the socio-economics of each of the 59 community districts in New York City, we make use of a data set provided by the Department of City Planning in NYC. This data set contains detailed demographic, socio-economic and housing characteristics for the different districts. The values in the data set are provided as 5-year averages over 2015 to 2019. The dataset is quite extensive and contains roughly 600 characteristics in total. We only make use of a subset of these characteristics, for example; the median age, median household income, percentage with Bachelor’s degree and median number of household rooms in each district. The data set is available here.


Geo data

To create map visualizations, we make use of a data set that contains shape information of the different community districts. This data set is available here and is provided by the Department of Sanitation in NYC.

We also make use of the PLUTO data set provided by the Department of City Planning, which contains coordinates for all tax lots in NYC, amongst other things. We use this data to compute the distance to the nearest public recycling bin for all tax lots (i.e. addresses) in NYC. The data set is available here.


NYC districts and their public recycling bins

The map below displays the 59 community districts in New York City as well as the registered public recyling bins in New York City. The map reveals that there is a high concentration of public recycling bins in lower Manhatten and Northern Brooklyn but a lower concentration in suburban areas.

New York City districts and public recycling bins

To further investigate the availability of public recycling bins in the different community districts, we visualize the distance to the nearest recycling bin for all locations in New York City. Not surprisingly, we see that the distance to the nearest public recycling bin is in general much lower for people living in Manhattan and Northern Brooklyn than it is for people living in suburban areas, such as Queens or Staten Island. We also notice that there are areas in the middle of Brooklyn where the nearest public recycling bin is almost 5km away!

Distance to nearest public recycling bin

Recycling over time

Next, we turn our focus to how the recycling behaviour of New Yorkers have changed over the past few years. In the map below the recycling rate, i.e. the amount of waste recycled as a fraction of the total waste stream, is shown for 17 months throughout 2018 and 2019. The recycling rates for both paper and for metal, glass, plastic & beverage cartons are visualized as well as the total recycling rate. The map reveals that the recycling rate is in general lower for paper than it is for metal, glass, plastic & beverage cartons. But we also see that all three recycling rates increases from the beginning of 2018 to the end of 2019 for almost all districts in the city. It seems that the New Yorkers have become increasingly more aware of recycling, kudos to the people of New York!

Recycling over time
Here is a link to the map if it does not load.

Actually, it turns out that a recycling campaign was initiated in New York City in 20181. Thus our visualization above reveals that it is likely that the campaign had an positive effect on the recycling behaviour of New Yorkers.

Recycling and socio-economic factors

Now we’ll turn to the social aspects of our data. The map below shows a variety of socio-economic factors by district as well as the recycling rates averaged over the timespan of the data. The map also shows the average distance to the nearest public recycling bin by district.

From the map, we see that there definitely is some socio-economic differences between the districts of New York City. For example, we see that the median age, median household income and percentage with a bachelor’s degree is much higher in (lower) Manhattan than in Brooklyn and the Bronxs. In contrast, the median number of household rooms is much lower in (lower) Manhattan than in the rest of New York. Finally, we see that the median age is higher in the suburbs in the outskirts of the city, i.e. Staten Island and some parts of Queens.

In general, we see that people living in Bronx and central Brooklyn are worse off socio-economically than most other New Yorkers, whereas people from lower Manhattan are doing quite well socio-economically and people living in Queeens and Staten Island are somewhere in between.

Looking at the average recycling rates by district, it seems that people from Staten Island and Queens are recycling the most whereas people from the Bronx and central Brooklyn are recycling the least. The level of recycling in Manhattan is somewhere in between. Thus it appears that the middle class is recycling the most, whereas the lower and upper class are worse at recycling. We also see that elder people are recycling more, i.e. the level of recycling seems to be higher in areas with a higher median age.

Finally, it seems that there is not really a correlation between the distance to the nearest public recycling bin and the level of recycling.

Recycling and socio-economic factors by district

The bar chart below explores some additional social-economic factors in relation to the average total recycling rate, and exploits the same patterns as above. For example, we see in the education panel that districts with a high percentage of people with no high-school diploma have low recycling rates and that a high percentage of educated people does not necessarily increase the level of recycling.

In the housing panel, we see that the highest recycling rates are seen in districts that has a low percentage of expensive houses and at the same time a low percentage of cheap houses. In contrast, the recycling rates are lower in districts in with a high percentage of expensive houses or a high percentage of cheap houses, i.e. we again see that the middle class recycles more than the lower class and upper class.

Bar chart of recycling and socio-econmic factors
Bokeh Plot

Prediction of recycling rates.

In this section we will showcase the result of predicting the average recycling rate in each community district using a gradient boosting model with the socio-economic factors as input. The figure below shows the feature importances for the trained model. Here we see that variables such as “employed” and “less_than_10k” are quite important to the model. These features describes the percentage of employed peope in the district and percentage with an income less than 10,000 USD, and are thus intimately related to socio-economic status.

It also appears that ethnicity is an important factor in one’s recycling behaviour, since features like “born_nyc” (percentage of people born in NYC), “Hsp1p” (percentage of hispanic population), “percent_us_citizen” (percentage of population with US citizenship) are important to the model.

We do actually also see that the distance to nearest public recycling bin is important to the model even though it was unclear from the prevuous analysis whether

Feature importances

On the plot below we can see the Pearson correlation between our target variable and the 8 most relevant features. We can observe the positive correlation in the employed variable and the negative one in less than 10k.

This means that the percentage of people being employed changes in the same way the recycling rate does. If we have more people having a job we will also probably have higher recycling rates.

The exact opposite happens with the people having an annual income of less than 10,000 $. If a district has a lot of low-paid residents it is possible that the recycling rates will be low. The correlation of the variables is a good indicator to find factors and the way they related to recycling rates.

Correlations
---

Conclusion

From our analysis, we have learned that the level of recycling in the 59 community districts in New York City can, to a certain degree, be explained by socio-economic factors. We have learned that the level of recycling is in general highest in districts with people of fair socio-economic status whereas the level of recycling is in general lower in districts with people of low or high socio-economic status.

Perhaps the reason for this pattern is, that both people from the lower class and upper class can have long working hours, as the lower class might need to work several jobs to make ends meet, and as the high-paying jobs of the upper class can also be quite demanding. Thus, these groups of people might not have the surplus energy that recycling requires.

Simirlarly, we see a higher level of recycling in districts with elder people, which is a group that might have more time on their hands, and thus the time required to do recycling.

These suggestions are, however, only hypotheses and need to be explored more and verified, which could be the subject of future work.


Explainer notebook

The GitHub link to our explainer notebook is here. Note that some of the output has been cleared in this notebook to comply with the 100mb file size limit in GitHub. We have also provided a version with all output cleared here. Finally, the notebook in html format with all output is available here.